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An important problem of reconstruction of diffusion network and transmission probabilities from 
the data has attracted a considerable attention in the past several years. A number of recent 
papers introduced efficient algorithms for the estimation of spreading parameters, based on the 
maximization of the likelihood of observed cascades, assuming that the full information for all the 
nodes in the network is available. In this work, we focus on a more realistic and restricted scenario, 
in which only a partial information on the cascades is available: either the set of activation times 
for a limited number of nodes, or the states of nodes for a subset of observation times. To tackle 
this problem, we first introduce a framework based on the maximization of the likelihood of the 
incomplete diffusion trace. However, we argue that the computation of this incomplete likelihood 
is a computationally hard problem, and show that a fast and robust reconstruction of transmission 
probabilities in sparse networks can be achieved with a new algorithm based on recently introduced 
dynamic message-passing equations for the spreading processes. The suggested approach can be 
easily generalized to a large class of discrete and continuous dynamic models, as well as to the cases 
of dynamically-changing networks and noisy information. 

PACS numbers: 05.10.-a, 02.50.Tt, 89.75.-k 


Introduction: Learning an underlying graphical 
model from observed data is a long-standing and im¬ 
portant practical problem in statistical physics, machine 
learning and computer science. Recent years have seen 
a renewed interest in development of fast and efficient 
algorithms to carry out this reconstruction problem 
in diverse contexts, such as gene regulatory networks 
[T], biopolymer structures [2], neuroscience [21 0] and 
sociology [5], from large datasets becoming available in 
these fields. An ongoing effort of scientific community 
has allowed to develop a number of techniques for solving 
the inverse problem for simple, but widely applicable 
models, such as the Ising model in static isHin] and 
dynamic [nHS] settings. However, inference of param¬ 
eters in a large class of important models of diffusion 
type has been less studied so far. Among a broad 
range of disordered and out-of-equilibrium dynamic 
models, a particular attention is devoted to cascading 
processes which are used for modelling phenomena in 
various domains: epidemic and rumor spreading dain], 
spreading of information and innovations in real-world 
and virtual social networks |18[ dS], avalanches in 
magnetic and glassy systems [2D], activation cascades in 
neural networks |21] . etc. 

Contrary to the case of recurrent spreading models, 
in which the network reconstruction can be achieved via 
observing one realization of dynamics of sufficient du¬ 
ration [221123] ) learning in the case unidirectional (also 
called progressive) dynamics requires a certain number 
of independent cascades with varying initial conditions. 
Given a subset of activation times for several realizations 
of the spreading process, the reconstruction problem aims 


to infer the transmission parameters of the model. In 
Bayesian framework, a common inference method relies 
on the maximization of the likelihood of observed infor¬ 
mation. In the case of fully observed cascades, this ap¬ 
proach has been indeed suggested in a number of recent 
papers [24H28] . leading to distributed convex optimiza¬ 
tion algorithms and outperforming previously suggested 
heuristics. However, in the majority of realistic applica¬ 
tions, it is very difficult or even practically impossible to 
monitor the state of each and every node over the whole 
duration of the diffusion process; hence a need to develop 
reconstruction algorithms which would be able to infer 
the parameters of the model in the presence of hidden 
nodes or incomplete time information on the cascades, as 
well as being robust with respect to the noise in the ob¬ 
servations. Despite the importance of this problem, the 
case of incomplete information in a dynamic process has 
been poorly addressed so far. The corresponding learning 
problem for kinetic Ising model has been treated in |29j 
by means of the path-integral approach and the mean- 
field methods. In the context of cascading processes, the 
work |30j addressed the network learning problem using 
relaxation optimization techniques, assuming as an in¬ 
put a full probabilistic trace for each node observed at a 
subset of times. 

In this Letter, we develop a systematic framework 
for parameter estimation in a spreading model from in¬ 
complete observations. As a first natural step in solv¬ 
ing this problem, we introduce two algorithms based on 
maximization of the likelihood of incomplete information 
via exact marginalization and approximate completion 
of missing data with the Monte Carlo (MC) sampling. 
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which however appear to be eminently costly; this repre¬ 
sents an important limitation of these schemes and makes 
their use impossible in the applications where a fast on¬ 
line learning is desired. As an alternative which would 
allow to considerably improve the computation time, we 
develop a new algorithm based on recently introduced dy¬ 
namic message-passing (DMP) equations for the spread¬ 
ing processes [3T]. These equations allow for an asymp¬ 
totically exact computation of marginal probabilities of 
node activation on loopy-but-sparse networks, and can be 
used as an approximate tool for solving computationally 
hard problems: recently, DMP equations have been ap¬ 
plied to the problem of inference of epidemic origin from 
a given snapshot of the process at a certain time |32j . 

Formulation of the problem: Let G = (V, E) be a con¬ 
nected undirected graph containing N nodes defined by 
the set of vertices V and the set of edges E. We ob¬ 
serve M independent realizations of cascades c, where 
each sample represents a set of activation times for 
the nodes in the network However, some in¬ 

formation on the cascades might be missing: the full in¬ 
formation can be written as E = Eq U E^, where Eq 
is the observed part of the cascades, and E^ represents 
the hidden part. For the sake of simplicity and defi¬ 
niteness, we assume that the activation process follows 
a discrete-time susceptible-infected (SI) model, which is 
defined as follows m-- each node i at time t G [0, T] can 
be in one of two states qi{t): susceptible, qi{t) = S, or 
infected, qi{t) = I. At each time step, an infected node 
j can transmit the information to one of its susceptible 
neighbors i on the interaction graph G with probability 
Oji, meaning that i changes its state with a probabil¬ 
ity Pt{S{i) I{i)) = 1 - nfcGai(l - akit[qk{t) = /]), 
where di denotes the set of neighbors of f; once the node 
is activated, it stays in the infected state forever. Note 
that the setting can be straightforwardly generalized to 
models with more complicated transition rules, such as 
SIR, threshold and rumor spreading models m- We as¬ 
sume that the cascades are simulated with the following 
initial condition: each node is independently drawn as 
infected with probability 1/A^, meaning that on average 
there is one “patient zero” at initial time; note, however, 
that it also means that some cascades have several ini¬ 
tial sources, while other cascades are trivial and do not 
contain any infected nodes. 

Our goal is to reconstruct the values of the couplings 
{ctij}{ij)^E = Ga- In the absence of any prior on the 
underlying model, Bayes’ theorem states that 

Ga = argmaxP(GQ | Eq) oc argmaxP(Ec) | Ga), (1) 

where Eq = {Eg}cg[i_M]- Hence, the task is to estimate 
efficiently the likelihood function P{Eo \ Ga)- Note that 
the formulation Q is valid for the case where the struc¬ 
ture of the graph is unknown (defining the problem on 
a fully-connected graph). However, in what follows and 
unless stated otherwise, we assume that the network G is 


known; treating the case of unknown graph with missing 
information would require some additional assumptions 
and constraints on the network structure. For the tests 
involving incomplete observations, we focus for definite¬ 
ness on the presence of nodes with hidden information, 
providing the study of other cases in the Supplemental 
Material (SM) [33]. 

Maximum likelihood estimator: If the information on 
all the nodes is available (E = E^), an efficient strategy 
would be to use a consistent maximum likelihood estima¬ 
tor (MLE), suggested in In the discrete formulation, 
the likelihood of the cascades, P(E | Ga), is given by: 

P(E|G„)=n n P,«|EV“,G„), (2) 

1<C<M 


where 


'rf-2 


P,(TnS>^G„)= I n l[{l-ak^l[Tf<t']) 

t'—O k^di 


1 - n (1 - <rf- Ij) ![< < T] . (3) 

\k^di / 


The estimation of the transmission probabilities Ga is 
given by the solution of the convex optimization problem 

Gq = argmin(-logP(E I Gq,)) , (4) 

which can be solved locally for each node i and its neigh¬ 
borhood due to the factorization of the likelihood under 
assumption of asymmetry of the couplings. In the case 
of partial observations, we need to consider the reduced 
MLE, performing a trace over the unknown activation 
times of the hidden nodes: 

P(Eo I G„) = ^ P(E I Ga). (5) 

{nlie-H 

An exact evaluation of ([^ is a computationally difficult 
high-dimensional problem with complexity proportional 
to P^, where El is the number of hidden nodes. Unfor¬ 
tunately, we found that this complexity can not be re¬ 
duced by approximating the objective function (§ with 
an appropriate MC sampling since the evaluation of the 
gradient should be very precise for convergence of the 
algorithm; see the SM [33] for more details. 

Heuristic two-stage algorithm: In order to keep the 
nice convexity properties of the full MLE, we introduce 
a modification of the scheme above which we also use as 
a benchmark (this algorithm will be referred to as HTS 
algorithm). The idea is to use two alternating stages at 
each step of optimization. First, we complete the missing 
information in the cascades E^ using the current estima¬ 
tion of the couplings Ga, i.e. update the activation times 
of the hidden variables as follows: 

E« = argmaxP(E | G^). 


( 6 ) 
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Again, an exact solution of ([^ still requires a number 
of operations proportional to , but in practice we ap¬ 
proximate the inference problem § using a MC sampling 
procedure. Second, we can solve the convex optimization 
problem using the “complete” S = SqUE-h, thus ob¬ 
taining a new estimation of Gq,; the procedure is repeated 
until convergence. This leads to an algorithm with com¬ 
plexity 0{\E\M + NMLh,t) at each step of optimization 
procedure, where \E\ is the number of edges, and Lh,t 
is a sampling parameter growing with T and [33j . 

Dynamic message-passing algorithm: A way to quan¬ 
tify the interdependence of activation times of different 
nodes is to use the dynamic equations that contain infor¬ 
mation about the correlations occurring in the spread¬ 
ing process. The suggested algorithm is based on the 
dynamic message-passing equations for diverse dynamic 
processes [3T]. According to the DMP equations for the 
SI model, the marginal probability of activation 

of node i € V a.t time Ti can be computed as 


m^Ti) = P^( 0 ) 


.k£di kGdi 


(7) 


for Ti > 0, with m*(0) = 1 — Pg(0), where Pg{0) is the 
probability that node i is initialized in the state S. The 
quantities 0^^'^it) are computed iteratively using the fol¬ 
lowing expressions: 

- 1 ) - - 1 ), ( 8 ) 

+ P|(0) n 0'^"(t-l)-P|(O) n (9) 

l^dk\i l^dk\i 


with the initial conditions = 1 and (^*“^■’ ( 0 ) = 

1 — ^ 5 ( 0 ), and dk\i denoting the set of neighbors of k 
excluding i. The proof that these equations are exact 
on trees and empirical studies of performance on random 
and real-world networks are discussed in m- 

Let us now explain the reconstruction algorithm based 
on the DMP equations. Given the data on the cas¬ 
cades Eo, we can compute the empirical initial condi¬ 
tions ^ 5 ( 0 ) and marginal probabilities simply 

given by the averages of activation times over different 
cascades at all nodes for which the information is known. 
The idea, reminiscent of what has been previously used 
in online learning of parameters in the context of artificial 
neural networks |34j . is to adjust the transmission proba¬ 
bilities Gq in order to minimize the mismatch J between 
the DMP-estimated and available empirical marginals at 
each time step: 

J = E = E E (10) 

t=o t=o ieo 

To this end, we use simple gradient descent: starting 
from some initial distribution of transmission probabili¬ 
ties, the couplings are updated as ^ a^r) — 


where e is the learning rate. The derivatives of the cost 
function (101 with respect to couplings can be expressed 
through ^ p>;^\t) and ^ for 

which the DMP-like equations can be easily written us¬ 


ing an explicit derivation of the equations §-(§ m- 
The update of the transmission probabilities is restarted 
from time zero until the convergence of the algorithm. 
Because of the averaging over the cascades, the result¬ 
ing computational complexity of an iteration step of the 
DMP algorithm is independent on M and E[ and is equal 
to 0{NdT), where d is the average degree of the graph. 


Performance of reconstruction algorithms: We test 
all the aforementioned algorithms on synthetic and real- 
world networks with the data generated from {cXij}(^ij')^E 
uniformly distributed in the range [0,1], and using T = 
10. We first illustrate their performance in the case of 
fully observed cascades. In the Fig. we present re¬ 
sults for the mean error of reconstruction per coupling 
on a tree network and on a connected component of 
an artificially-generated random graph with the Pareto 
power-law degree distribution of shape parameter 2.5 and 
minimum value parameter 1. As expected, MLE gives a 
much better prediction of couplings overall compared to 
the DMP algorithm, because of the loss of information 
due to the averaging over the statistics of cascades in the 
latter. Yet, in the case of imperfect measurements, this 
averaging may play a positive role for the quality of re¬ 
construction: in the SM [^, we show that already in 
the presence of weak noise a naive application of MLE 
leads to a quite poor reconstruction since this scheme 
crucially depends on the detailed information contained 
in the data, while DMP remains much more robust to the 
noise. Still, as demonstrated in the insets of the Fig. 
assuming the perfect measurements we see that whereas 



FIG. 1. (Color online) Main figures: Comparison of mean 
error on the reconstructed couplings by MLE and DMP al¬ 
gorithms initialized at aij = 0.5 for all edges {i,j) G if as a 
function of the number of fully observed cascades M for (a) 
a tree network with A = 50 and (b) a connected component 
of a power-law network with N = 53. Insets: Scatter plots of 
transmission probabilities reconstructed by DMP algorithm 
for M — 10® versus true couplings for (a) the tree and (b) the 
power-law network. 
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FIG. 2. (Color online) Comparison of reconstruction perfor¬ 
mance in terms of the mean reconstruction error for different 
values of M and H of MLE, DMP*-1-ML and HTS algorithms 
on (a), (b) a connected component of a power-law network 
with A'’ = 20 and (c) a real-world Sampson’s monastery net¬ 
work with = 18 nodes [33 [3S]; the topology of the latter 
is visualized in (d) using Gephi |37| . with darker color in¬ 
dicating the randomly chosen location of nodes with hidden 
information used in the plot (c). 


on a tree network the DMP algorithm yields almost a 
perfect reconstruction of couplings for a large number of 
observed cascades, it makes clearly wrong prediction of 
couplings in the vicinity of very short loops present in 
the small power-law network due to the inaccuracy of 
the DMP equations on graphs with small loops and to 
the degeneracy of solutions leading to the same values of 
marginal probabilities. 


In order to reduce the error of the DMP equations due 
to short loops which are present in most of realistic net¬ 
works, we suggest a modification of the algorithm (which 
we denote DMP* in what follows) based on the following 
observation: if one chooses to run the DMP equations 
for only T' < I time steps, where I is the length of the 
shortest cycle in the network, and keep only T' terms in 
the sum (10), the marginals predicted by DMP will be 
exact since the spreading will effectively stay on a tree. 
The drawback of this choice consists in a further increase 
of the degeneracy of possible solutions, especially in the 
presence of hidden nodes; but this feature is proper to 
all considered schemes since the objective function land¬ 
scapes naturally develop local minima for loopy graphs. 
Thus aiming at reinforcing the convergence of algorithms 
towards the true solution, we choose to initialize them us¬ 
ing the following idea: since MLE is still supposed to give 
a good estimation of the parameters for visible nodes if 


one ignores the hidden nodes in its neighborhood |33j , we 
estimate and “freeze” this part of couplings using the fast 
local maximization of ([^, and optimize over all remain¬ 
ing couplings setting them equal to 0.5 at initial time. If 
this initialization strategy is used in the DMP algorithm, 
we denote the corresponding modification by DMP-I-ML. 

The Fig. is devoted to the tests in the presence of 
nodes with hidden information. Because of the large 
convergence times for MLE and HTS in the case of in¬ 
complete information, we were forced to perform tests 
on small and loopy networks. Note that if the hidden 
node is a leaf of the network, then no algorithm can 
reconstruct the transmission probability associated with 
the ingoing directed edge adjacent to this nodes; there¬ 
fore, we do not include such parameters in the compu¬ 
tation of the mean error throughout the Fig. As ex¬ 
pected, MLE demonstrates the best performance, but 
it is not practically applicable even for small networks 
of size N < 20, and we show only the first few values 
as reference points for other algorithms. Although in 
most of the cases the DMP*-|-ML algorithm is less pre¬ 
cise then the HTS approach, overall it demonstrates simi¬ 
lar reconstruction results, sometimes even outperforming 
HTS, as shown in the Fig.[^(c) for a small real-world net¬ 
work of friendship relationships extracted from an ethno¬ 
graphic study of community structure in a New England 
monastery |^|35], depicted in the Fig.[^(d). The whole 
difference lies in the comparison of computation times: 
for instance, computation of the point corresponding to 
H = 4, M = 10^ on a power-law network with N = 20 in 
Fig.[^(b) took a time of order of two weeks for MLE, one 
day for HTS and less than 30 seconds for the DMP algo¬ 
rithm. Basically, it means that the HTS algorithm is not 
applicable even for reasonably small networks, such as a 
network of retweets with N = 96 nodes [38]. Test of the 
DMP algorithm on this and other real-world networks 
[ 551155 ] , as well as other examples involving incomplete 
observations in time and noisy information are provided 
in the SM [ 55 ] . 

Conclusion and perspectives: An approximate solu¬ 
tion given by the DMP equations for spreading processes 
allowed us to introduce an efficient algorithm for the re¬ 
construction of transmission probabilities in the presence 
of hidden information, demonstrating comparable results 
with respect to the methods based on the maximization 
of incomplete likelihood at a substantially lower compu¬ 
tational cost. It can be used for large networks, provid¬ 
ing the best performance in the case of sparse networks. 
Let us indicate some possible generalizations and per¬ 
spectives. It would be useful to understand whether the 
performance of the DMP algorithm could be further im¬ 
proved by matching the two-point correlations. Along 
with applications to a range of other spreading models 
ED, the DMP algorithm can be straightforwardly gen¬ 
eralized to the case of continuous-time models using the 
corresponding version of the DMP equations [ 10 ] and to 
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dynamically-varying networks in the spirit of |27j using 
the time-dependent couplings An interesting fu¬ 

ture direction would be to adapt the DMP approach for 
the network structure learning under some restrictions on 
the class of networks one tries to reconstruct, e.g. using 
a £i-regularization [531[55]. Finally, in the spirit of active 
learning, it might be advantageous to optimize over the 
particular choice of a reduced number of costly measure¬ 
ments sufficient for an accurate reconstruction. Some of 
these directions are further discussed in the SM |55] . 
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Supplemental Material 

Efficient reconstruction of transmission probabilities in a spreading process from 

partial observations 

Audrey Y. Lokhov and Theodor Misiakiewicz 


In the Supplemental Material, we provide implementation details for all the algorithms presented in the main text, present 
some supportive plots and results, as well as discuss some possible future directions outlined in the main text. 


DETAILS ON THE IMPLEMENTATION OF THE ALGORITHMS 
Exact marginalisation for the MLE 

The estimation of model parameters in the MLE algorithm is based on the maximization of the log-likelihood of 
observed part of the cascades Eq = 


/:(Eo I ^ E log P(E^ I G„), (11) 

^ C=1 

where \ Go) in the case of nodes with hidden information represents the likelihood of the observed part of a 

cascade c: 


P(E^ I G„) = E I G„). 

{rflie-H 


( 12 ) 


The marginalization over unknown activation times in (12) is a hard high-dimensional integration problem, with an 
exponential complexity . A usual approximation scheme used in this kind of problems consists in evaluating the 


sum (12) using a biased Monte Carlo sampling. We have encountered the following problem on this way: the objective 


function is dominated by an ensemble of cascades with small likelihood, and since we need to evaluate (11) at each 


step of the gradient descent using the current estimation point of couplings (which may be very distinct from the true 
ones), we found that in practice the algorithm has convergence issues if (0 and the gradient 


1>^VP(S^|G„) 
V/:({So|,e[i,M] I G.) - - 2^ P(E^ I G„) 


(13) 


are evaluated approximately, even with a large sampling parameter. Hence, we have used an exact evaluation of the 
integrals for the MLE algorithm throughout this work, also having in mind the purpose to produce reference points 
for comparison of the accuracy for other suggested algorithms with a lower computational complexity. Since one 
needs to evaluate 0 and ( |13[ ) for each of M independent cascades, the resulting computation complexity of the 
MLE algorithm for each step of the gradient descent is 0{NT^M). In order to speed up the convergence, we first 
performed a rough gradient descent using a renormalized gradient for a fixed number of steps to get relatively quickly 


to the proximity of the minimum, and only then used the exact values of the gradient (13) until the convergence of the 
algorithm. As a criterion for convergence, we required the variation of the objective function (11) in two successive 
steps to be smaller then a certain threshold Smle', the value Smle = 10“® was used in our simulations. 


Completion of cascades in the HTS algorithm 

Although we were not able to use approximation schemes for the MLE scheme, the Monte Carlo sampling appears 
to be an important element of the heuristic two-stage algorithm. In the first stage of the algorithm, we attempt to 
complete the missing activation times in each cascade c with their most probable values given the current estimation 
of couplings Ga, i.e. to solve the following optimization problem: 


{Th}hG'H = argmaxP({rf}ieo,{T^}?,e« | G^). 


(14) 
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Again, the search space for the missing activation times is in general of the size T^, but we significantly reduce the 
computational time by exploiting the following idea: the likelihood is non-zero only for the activation times which 
form a possible cascade on the graph Ga- Hence, for each cascade c, we sample Lh,t auxiliary cascades starting from 
the source points in the visible part of the cascade {i \ i G 0 ,t^ = 0} (if there are any) and from random sources in 
the hidden part H selected with the probability 1/iV. We then choose the subset of activation times for the nodes in 
the hidden part H (from one of the auxiliary cascades) which maximises the right hand-side of (141 as a solution of 
the first step of the algorithm. 

The value of the sampling parameter Lh,t should be small enough to improve the speed of the algorithm, and 
large enough to allow for a precise inference of parameters and to lead to a successful convergence. Intuitively, it is 
clear that Lfj j' should grow as & H and T increase, but it is hard to get an exact dependency. In practice, we found 
that a good convergence of the algorithm is ensured for the studied networks of size fV ~ 20 and T = 10 with the 
following values: Lh,t = 100 for i? < 3 , Lh,t = 1000 for 4 < 77 < 6 and Lh,t = 10000 for 7 < 77 < 10. The 
computational complexity of completion of M cascades using the MC sampling is 0{NMLh.t)- In the second stage 
of the algorithm, we need to preprocess the hence “complete” data of the cascades for the solution of the convex 
optimization problem (4) of the main text, and this requires 0(|7f|7V7) of operations. The two stages are alternated 
until the global convergence of the algorithm, which we establish as a variation of two consecutive estimations for the 
transmission probabilities {ctij}(ij)^E is less then the threshold Shts- With the values of Lh,t given above, we used 
Shts = 10“^ in all the simulations. 


Computation of the gradient in the DMP algorithm 


In this section, we will provide details for the derivation of the dynamic message-passing equations that we use to 
compute the gradient of the cost function 


1 




ieo 


at each iteration step of the DMP algorithm: 

dJ{t) 

doir<i 


= Ekw-™^w] 


iGO 


dOLrfi 


For convenience, let us recapitulate the DMP equations: 


I)_ , 

.k£di k£di 

- 1 ) - - 1 ), 

= {1 - - 1) + PliO) _ 1) _ p|(0) Yl 


lGdk\i 


lGdk\i 


Using the equation ( |18| ), we have 
dm^(t) 


dar 


= pm 




nl—^i 


kGdi 


lG.di\k 


dar^i 

k^di l£di\k 


it) 


Let us introduce useful notations: 


dars 


omit) 


dar 


^ €Tit)- 


(15) 


(16) 


(17) 

(18) 
(19) 


( 20 ) 


( 21 ) 


Since the initial dynamic messages (0)}(ij)g£; and (^0)}{ij)eE are independent on the couplings, we have 
PrJ^^iO) = ( 7 ^J^*( 0 ) = 0 for all k, i, r and s, and these quantities can be computed iteratively using the analogues of 
the equations (18) and (19) of the main text: 

P^it) =pmit - 1) - ak^qmit- 1) - m^it - l)l[fc = r,i = s], 


q^it) = (1 - aki)qmit - 1) - m^t - l)l[k = r,i = s] 


Psio) E Pr 7 Ht -1) n -1) - pim E 'w n 

l^dk\i n^dk\{i,l} l^k\i n^dk\{i,l} 


( 22 ) 

( 23 ) 
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Hence, at each time step, we compute the marginals using the BMP equations ([T7|-([I^, and use (16) and (20l-(23) 
to update the couplings. In principle, at each iteration step of the algorithm we could run the BMP equations for all 
T time steps with the current estimation of the couplings, and only then update the transmission probabilities using 
the derivative of the total cost function J = J2t=o found that the “online”-like update at each time step in 

the spirit of [33] leads to a faster convergence of the algorithm. An intuition behind this choice is as follows: instead 
of accumulating the error due to the current estimation of the couplings through the whole process, we adjust the 
couplings progressively as the process spreads through the network. 

In practice, we observed that simple gradient descent with a fixed learning rate e demonstrated good convergence 
to the optimum, and this is the procedure that we used for producing all the plots in this paper. For most of the plots, 
we used e = 5.0, and the tolerance on the change of the objective function 6dmp = 10“^^ as a stopping criterion for 
the algorithm. We saw that the reconstruction results can be improved with a modification of the algorithm BMP* 
using a reduced number of time steps T' of order of the length of the shortest cycle in the network for an increased 
accuracy of the BMP equations. It would be interesting to see if the convergence properties of the algorithm in very 
hard cases could be further improved by exploring two straightforward extensions: 

1. Adding to the cost function J terms that would reinforce a matching of the two-point correlations corresponding 
to the probabilities of observing mean probabilities of pairs {Si{t)Sj{t)), {Si{t)Ij{t)) and {Ii{t)Ij{t)) for {ij) G E. 
In fact, these two-point correlations at equal times can be computed within the BMP approach: the basic relation 


is {Si{t)Sj{t)) = where 


Ps"'it) = pm n 


(24) 


k^di\j 


has a meaning of the probability that the node i is in the state S at time t in an auxiliary cavity dynamics 
Dj^ in which the node j is fixed to the state S for all times; see |3T| for more details. Once {Si{t)Sj{t)) are 
computed, we immediately obtain 


{s.mit)) = m) - m)m) = i - m) - imsm- 


(25) 


Since correlations in (25) can be expressed from the single-node marginals and correlations {Si{t)Sj{t)), we 
may try to reinforce only the S — S correlations. We have performed several tests using the following modified 
objective function: 


{ij)&Eo 


(26) 


where Eq represents the ensemble of edges between observed nodes in G, and A{Si{t)Sj{t))t = {Si{t + l)Sj{t + 
1))* — {Si{t)Sj{t))t, where denotes empirical correlations obtained from averaging over the given data. The 
gradient of J'{t) can be easily written, since the derivative of a BMP-computed correlation reads 


dcxj-s 


d 


-it) 


dar 


= {pmy 


r'rs 


{t) 


/ ^ 0k^i 
ykei\j 


it) 






(27) 


Our preliminary tests are rather inconclusive: we found that while in general the minimization of (26) indeed 


improves the convergence speed and the accuracy of solution in the case of sparse networks and small number 
of nodes with missing information, it may degrade the quality of reconstruction in the case of loopy networks 
and a large number of hidden nodes with respect to the use of J(t), presumably because the inclusion of 
correlations amplifies the error of the BMP equations in these cases. Further investigations are needed to get a 
clear understanding under what circumstances the use of correlations can be advantageous. 

2. Using the information contained in the second derivatives of the cost function J for a better control over the 
convergence. Indeed, the second derivatives can be computed in the same message-passing way as the gradient; 
it would, however, involve manipulations with the sixth-order tensors, as opposed to the fourth-order quantities 


used in the computation of the gradient ( |^ and (23). For example, if we denote 




darsdCKu 


— ^k^i 

— Hrs.uv 


(t), and since from ( [l7| we have rrPit) = Pgit — I) — Ts(t) for t > 0, where 

p^sit) = pm n 

k^di 


( 28 ) 
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it is sufficient to compute 


d^Psit) 

dcXrsdttuv 


pm 


k^di 


yrs.uv 


it) n 

lGdi\k 




-v^~ 

fuv 


'\t) E Pr 

m^i\k 


‘w n 

lGdi\{k,m} 




(29) 


(<) following dynamic message-passing equations obtained as a derivative of (221 and (231 with respect 
(t) is indeed a six-tensor, which makes it use less practical; we did not use it in our 


to auv Note that p: 
simulations since the observed convergence was already good enough. 


^k—>i 
■^rs.uv \ 


ADDITIONAL SUPPORTIVE PLOTS AND RESULTS 


Reconstruction of transmission probabilities in the case of noisy information 


In this section, we show that the DMP algorithm is naturally adapted for an efficient reconstruction in the case 
where the activation times are observed with some noise fluctuating around the true values of the activation time. 
The reason for that lies in the averaged origin of the empirical marginal probabilities used as an input for the DMP 
algorithm: while this averaging of the original data may represent a certain drawback since some detailed information 
on the cascades is lost, in the case of the observations corrupted by noise this procedure has a clear advantage because 
of the effective averaging over the fluctuations. As a simple test, we have perturbed all the observed activation times 
{Ti}iGV for c = 1... M cascades uniformly with a noise of randomly chosen sign, with absolute value 

distributed according to the Poisson distribution with mean p — 0.1. The results of a naive application of MLE and 
DMP algorithms are presented in the Fig. We see that since DMP algorithm starts to yield better reconstruction 
results with respect to the MLE which uses a detailed information on the cascades and performs better in the noiseless 
case. This relative robustness to noise represents a useful property of the DMP algorithm. 
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FIG. 3. (Color online) Main figures: Comparison of mean error on the reconstructed couplings by MLE and DMP algorithms 
initialized at aij — 0.5 for all edges {i,j) G if as a function of the number of fully observed cascades M in the presence of a 
weak Poissonian noise for (a) a tree network with A = 50 and (b) a connected component of a power-law network with N = 53. 
Insets: Scatter plots of transmission probabilities reconstructed by DMP algorithm for M — 10® versus true couplings for (a) 
a tree and (b) a power-law network. 


Reconstruction of transmission probabilities in the case of incomplete information in time 

Let us discuss another possible type of missing information, when we do not have access to the state of the network 
at all times, but record its state only at a certain set of observation times To = {t^}k=i,...,K■ For simplicity and 
without loss of generality, let us assume that the state of the network is recorded at time intervals of equal duration 
A, meaning that Vfc, = A. Basically, for each cascade c and node i we don’t know the exact value of its 

activation time rf, but only know that it falls inside an interval ]{k'[ — I)A, kfA], A special value A = 1 corresponds 
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to the case of full information. Once again, MLE requires an exponential number of evaluations proportional to A^, 
so we do not consider it here. 


HTS algorithm 


Similarly to the case of nodes with hidden information, at the first stage of the algorithm, for each cascade c we 
need to find the activation times which maximize the likelihood given the intervals they belong to and our current 
estimation of transmission probabilities: 

= arg max P({r,f}igv | G„). (30) 

r.G](fcf-l)A,fc=A] 


Again, we choose to approximately solve this optimization problem with a MC sampling of La cascades, using as 
sources {i\i £ V,t^ = 0}. If the sample maximizing (30) contains activation times outside of the allowed interval, we 
fix them to the values found at a previous time step. The algorithm is initialized with A, and on the networks 

of size A = 20 we used La = 1000 with A = 2 and La = 10000 with A = 3 and A = 4, the stopping threshold 
Shts = 10“^. 


DMP algorithm 


In the case of the incomplete information in time, we choose the following objective function for the DMP algorithm: 

K-l 


J" = 


E = E E ^ - AP^t^'^))' 


teTo 


(31) 


k—1 i^V 


where AP^(t('=)) = P^(t('=)) - P|(t('=+^)), and AP^(t('=)) is an analogical empirical quantity from the data. Note that 
for A = 1, APg(t) = and J" = J. In this scheme, the couplings are updated according to ^ 

for t S To and fixed learning rate e, where 


_ dr it) 

do^rs 

The derivative of Pg{t) reads: 


[AP^(t('=))-AP^(t('=)) 


iGV 


doirs dcXj-s / 


dP^sit) 

dcXrS 


k^di l^di\k 


(32) 


(33) 


where obeys the same set of dynamic message-passing equations (22|-(23) as before. 


Numerical results 


The results for the case of missing information are presented in the Fig. for a tree network (a)-(b) and for 
a connected component of a power-law network (c)-(d) with A = 20 nodes. We chose uniform initial conditions 
aij = 0.5 V(ij) G E for the DMP algorithm in this case, although one could try to combine it with a ML starting 
point, as we have done for the case of the nodes with missing information. Again, HTS demonstrates a clearly better 
performance, but at an expense of a very large computational time (of order of a day for A = 3 and M = 10^) with 
respect to the DMP algorithm, which converges in less than a minute for such a small network. 

Obviously, in the case of mixed partial observations (including nodes with hidden information and missing times), 
we simply have to eliminate the hidden nodes in the corresponding sum in the equation (31); we do not present results 


in this case in order to clearly draw a distinction between the effect of two situations. Finally, let us note that the 
distribution of observation times does not need to be uniform: if each observation is costly, one could optimize over 
the best subset of measurement times still requiring a reasonable reconstruction quality, an idea often used in the 
field called active learning. 
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FIG. 4. (Color online) Test for the case of incomplete information in time. The results are presented for different values of 
parameters M and A for (a), (b) a small tree network with N = 2Q nodes, and (c), (d) a connected component of a power-law 
network with A = 20 and containing cycles of short length. 


Results for the real-world networks 


In this section, we present some supplementary numerical studies of reconstruction algorithms for different real- 
world networks. In the Fig. we represent an average reconstruction error on the Sampson’s monastery network 
[551155] for different H (up to eight hidden nodes which locations are depicted in the Fig. (b)) with M = 10^ 
cascades. As highlighted in the main text, the DMP algorithm demonstrates a competitive quality of reconstruction 
with a major gain in the computational speed. Essentially, the algorithms based on the maximization of the incomplete 
likelihood are not applicable even to the networks of a moderate size, such as the network of retweets with N = 96 
nodes [55]. Hence, only results for the DMP*-|-ML algorithm are presented for this example of a real social network, 
cf. Fig. the outcome for this network is quite reasonable, with an accuracy comparable to the one obtained in the 
study of the Sampson’s monastery network. 



FIG. 5. (Golor online) (a) Comparison of reconstruction performance of MLE, DMP*-|-ML and HTS algorithms for a real-world 
Sampson’s monastery network with A = 18 nodes [351136] for different numbers of hidden nodes H but fixed number of cascades 
M = lO’^. (b) Visualization of topology of the network obtained with Gephi [57]; a darker color indicates the location of nodes 
for which the information is missing. 

Let us nonetheless also provide an example of a dense topology on which both algorithms fail, the DMP-based one 
being especially inefficient. An instance of such a topology is given by the network of political alliances and enmities 
among the 16 tribes of Eastern Central Highlands of New Guinea [55], documented by Read (55]. Because of the 
high density of the network, the hidden nodes introduce a high degeneracy of the possible solution and play a more 
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FIG. 6. (Color online) (a), (b) Reconstruction performance of the DMP*+ML algorithm on a network of retweets with = 96 
nodes [^. (c) Topology of the social network with a darker color indicating the location of 48 (half of the total number of 
vertices) randomly chosen hidden nodes. 


important role compared to the case of sparse networks, so that when the fraction of hidden nodes is of order of a half 
of the network, the reconstruction algorithms do not yield any useful information on the couplings (the demonstrated 
error is of the same order as the one at the initial point, where all transmission probabilities are set to 0.5). 



FIG. 7. (Color online) (a), (b) Reconstruction performance of DMP*+ML and HTS algorithms on a network representing the 
alliance structure among = 16 tribes of New Guinea [39]. (c) Network topology, visualized with Gephi [S^; a darker color 
indicates the location of nodes with hidden information. 


DMP algorithm for continuous dynamics 


All the results in this Letter have been presented for the discrete-time formulation insofar. In this section, we briefly 
discuss how the same techniques can be easily extended to continuous case. The maximum likelihood estimator has 
been originally suggested for the continuous dynamics of the independent-cascade model [25; our extension of the 
MLE for the case of missing information follows straightforwardly the main text, with discrete sums replaced by the 
integrals. For the DMP algorithm, we can use the continuous version of the DMP equations for the SI model, derived 
for the first time in |40| . An important difference only concerns a choice of the objective function J: as in the case 
of missing information in time, it is more convenient to quantify the mismatch in terms of differences of probabilities 
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Pg{t) at a certain number of discrete times: 


n—1 




t=o 


t=o ieo 


AP^ -T] - API -T 


(34) 


where n is the number of discretization steps in time which should be related to the statistics of activation times in 
M observed cascades, or to the set of observation times in the case of incomplete information in time, while APg{t) 
and APg(t) are defined in a way similar to the case of incomplete time information. 

In the case of constant rates we define the transmission function as = aije~°‘'P. Then the functions 

are computed as follows |iD] : 


pt 


/ dr /,j (t) 

1-Pm n 

Jo 

k£di\j 
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dr I I . 

ykedi\j 


(35) 


In order to compute the dynamic messages we can either integrate the expression (35) numerically, or 

(36) 


transform the equation above into an ordinary differential equation by integrating the last term in (35) by parts: 


k^di\j 


which can be solved numerically starting from initial conditions 9'^~^^(0) = 1. The probabilities Pg(t) are computed 
according to (28) in the continuous case as well. 

The couplings are updated according to <— ai*J — for t e [0,n], At = Tjn, and fixed learning rate 

e, where 
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The derivative of Pg {Tt/n) reads: 


(37) 


(38) 


k^di 


lGdi\k 


where (t) obeys the following ordinary differential equation, obtained by taking the derivative of (36): 


dt 


=+ «aP|(0) ^ p™r^i) n 

l^dk\{i,m} 


m^k\i 

+ 1 [A: = r, i = s] 


_6»fc^*(i)+P|(0) 6»'^'=(t) 

lGdk\i 


(39) 


Perspectives of the structural learning using the DMP algorithm 

In this section we show that the DMP algorithm in its simplest form is not suitable for a reconstruction of the 
network structure. As an example, we simulated M = 10® cascades for T = 10 time steps on a small tree network 
with = 10 nodes; its topology is depicted in the Fig. (a). Precomputed empirical marginal probabilities have 
been given to the DMP algorithm, and since the underlying network is supposed to be unknown, we initialized the 
algorithm on a fully-connected graph with all couplings equal to 0.5. As demonstrated in the Fig. [^(b), the resulting 
DMP-reconstructed network is sparse, but does not have the exact topology of the original graph. Notice that the 
DMP algorithm does exactly what it is supposed to do: it tries to match the marginal probabilities given by the 
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DMP equations with the empirical marginal probabilities from the simulations, see Fig. [^(c). It turns out that the 
algorithm sets the majority of transmission probabilities to zero, reconstructing a sparse directed network that fits in 
the best way to the input data. In order to get an idea about the similarity of the original and the DMP-reconstructed 
networks, we summarize some basic topological parameters of both graphs in the Table |I] 





FIG. 8. (Color online) Results given by the DMP reconstruction algorithm in the case where the underlying network is unknown: 
(a) Topology of the original network (a tree with = 10 nodes), (b) Topology of the reconstructed network, (c) Scatter plot 
of marginal probabilities m*(t): empirical from the simulations m\{t) versus those predicted by DMP mbMp(t)- Thickness of 
network edges are proportional to the (a) true and (b) estimated transmission probabilities, the direction of the edges on the 
graph (a) should be read clockwise from source to target. Both networks are visualized with Gephi m- 

We saw that because of the loss of information due to averaging of original data, the DMP algorithm in its simplest 
form does not lead to a correct reconstruction of the diffusion network, even in the case of fully available observations. 
Let us however outline some situations in which the use of DMP for structural learning would most probably make 
sense, especially in the presence of hidden nodes. In this case, all the algorithms would require some additional 
assumptions on the network structure. If a part of the network is known, this would likely impose an additional 
constraint on the DMP algorithm, forcing it to reproduce the original topology. If no information on the location 
of the hidden node is available, which would be a hard case for all reconstruction algorithms, one could think of 
the use of DMP algorithm with £i regularization to speed up the learning process in the same way as we used the 
DMP algorithm for parameter learning in this Letter. We demonstrated that parameter learning in the presence of 
missing information is already a hard problem even when the underlying network is known; we anticipate that the 
structural learning with latent variables and without any prior information on the network would be even harder from 
the algorithmic point of view. The most direct way to solve this problem would probably be based on the information 
contained in the effective interactions at several time steps created via a marginalization over hidden nodes, and using 
correlations between individual cascades in order to distinguish between different latent nodes. These interesting 
questions are left for the future work. 

However, let us note that the DMP algorithm in its present form could be straightforwardly used for the network 
engineering purposes, i.e. for building a sparse network with some prescribed properties with respect to a given 
cascading process given, for instance, by the average probabilities of activation at certain times. 



Original 

Reconstructed 

average degree 

1.8 

2.2 

average weighted degree 

1.685 

0.583 

graph density 

0.200 

0.244 

diameter 

5 

5 

number of shortest paths 

90 

72 

average path length 

2.82 

2.28 


TABLE I. Comparison of topological parameters between original and DMP-reconstructed graphs. 










